Add eval runner support by potofpie · Pull Request #194 · agentuity/sdk-js

potofpie · 2025-10-06T14:13:33Z

Summary by CodeRabbit

New Features
- Added an evaluation system with run/record capabilities and a background eval job scheduler.
- Exposed SHA-256 hash utilities (async and sync) in the public API.
- Prompt metadata now includes eval references for end-to-end eval tracking.
Bug Fixes
- Improved error reporting and diagnostics when loading prompts.
Tests
- Added unit tests for the new hash utilities.
Chores
- Initialized eval scheduler at startup and expanded public exports.
- Added .DS_Store to .gitignore.

coderabbitai · 2025-10-06T14:14:08Z

Walkthrough

Adds an evaluation subsystem (EvalAPI, EvalJobScheduler), integrates eval execution into request/session lifecycle and server startup, introduces SHA-256 hash utilities and tests, propagates eval identifiers through prompt metadata, and includes minor infra/logging and .gitignore updates.

Changes

Cohort / File(s)	Summary
Project Configuration `\.gitignore`	Adds `.DS_Store` to ignored files
Package Metadata `package.json`	Version bumped 0.0.156 → 0.0.157; no other substantive changes
Service API Extension `src/apis/api.ts`	Extends `ServiceName` with `'eval'` and adds base-url logic preferring `AGENTUITY_EVAL_URL` (fallback to transport URL)
New Eval API `src/apis/eval.ts`	New `EvalAPI` export with types/interfaces and methods to discover evals (by name/id), parse embedded metadata, dynamically import eval modules, execute evals, collect results, and POST run completion
Eval Job Scheduler `src/apis/evaljobscheduler.ts`	New singleton `EvalJobScheduler` export managing `PendingEvalJob` entries with `createJob`, `removeJob`, `getJobs`, `printState`, and cross-context singleton initialization; exports `PendingEvalJob` and `JobFilter`
Prompt Handling `src/apis/prompt/index.ts`	Adds per-path existence checks, per-path try/catch recording `attemptErrors`, richer error reporting when all paths fail, and finer-grained debug logs
Public Exports `src/index.ts`	Re-exports `EvalAPI`, `EvalJobScheduler`, `hash`, `hashSync`; expands public API surface and imports new modules
Router Integration `src/router/context.ts`	Adds private `executeEvalsForSession` and `executeEvalsForJob` to process pending eval jobs post-session completion; integrates EvalAPI and EvalJobScheduler and adds logging
Router Control Flow `src/router/router.ts`	Changes router then-callback to async and explicitly awaits `contextHandler.waitUntilAll`, with logging before/after await
Server Initialization (Bun) `src/server/bun.ts`	Initializes global `EvalJobScheduler` on server start; consolidates imports
Server Initialization (Node) `src/server/node.ts`	Initializes global `EvalJobScheduler` before server creation; imports `isIdle` for idle checks
Hash Utilities `src/utils/hash.ts`	New async `hash(value: string): Promise<string>` (Web Crypto) and sync `hashSync(value: string): string` (Node crypto) with ArrayBuffer→hex helper
Prompt Metadata `src/utils/promptMetadata.ts`	Adds `evals?: string[]` to `PromptAttributesParams`, switches template/compiled hashing to `hashSync`, and propagates `evals` through metadata and storage
Hash Tests `test/utils/hash.test.ts`	New unit tests verifying async/sync hash parity, consistency, distinctness, and empty-string handling
PatchPortal tweak `src/apis/patchportal.ts`	Random segment generation changed from `substr(2,9)` to `substring(2,9)` (affects instanceId only)
Changelog `CHANGELOG.md`	Adds v0.0.157 entry: "Add support for eval running"

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Router
    participant ContextHandler
    participant EvalJobScheduler
    participant EvalAPI
    participant EvalModule
    rect #f0f7ff
    Router->>ContextHandler: handle request -> waitUntilAll(sessionId)
    ContextHandler->>ContextHandler: await pending promises
    end
    ContextHandler->>EvalJobScheduler: getJobs(sessionId)
    EvalJobScheduler-->>ContextHandler: [PendingEvalJob...]
    loop Per job
        ContextHandler->>EvalAPI: loadEvalMetadataMap()
        EvalAPI-->>ContextHandler: Map<slug,id>
        loop Per prompt in job
            ContextHandler->>EvalAPI: runEval(evalName,input,output,sessionId,spanId)
            EvalAPI->>EvalModule: dynamic import (file URL)
            EvalModule-->>EvalAPI: EvalFunction
            EvalAPI->>EvalModule: invoke eval with EvalContext
            EvalModule-->>EvalAPI: EvalRunResult
            EvalAPI->>EvalAPI: POST /evalrun/finish
            EvalAPI-->>ContextHandler: CreateEvalRunResponse
        end
        ContextHandler->>EvalJobScheduler: removeJob(spanId)
    end
    ContextHandler->>ContextHandler: mark session completed

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Add prompt meta data #193 — Modifies prompt metadata and PatchPortal-related logic; strong overlap with prompt metadata and PatchPortal changes in this PR.

Poem

🐰 I hopped through code with nimble feet,

New evals to find and jobs to meet,
Hashes hum secrets, tidy and neat,
Sessions close — then evaluations repeat,
A carrot-cheer for builds complete 🥕

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Title Check	✅ Passed	The pull request title "Add eval runner support" accurately describes the primary objective of the changeset. The PR introduces comprehensive evaluation running capability through multiple new components: the EvalAPI class for loading and executing evals, the EvalJobScheduler singleton for managing evaluation jobs, supporting utilities like SHA-256 hashing, and integration points in the router and server initialization. The title is clear, concise, and specific enough that a teammate reviewing the PR history would immediately understand this adds eval execution functionality. It avoids vague terminology and directly conveys the main enhancement.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch eval-handler

Warning

Review ran into problems

🔥 Problems

Errors were encountered while retrieving linked issues.

Errors (1)

SHA-256: Entity not found: Issue - Could not find referenced Issue.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…handler

jhaynie

need to think about idle detection so we dont suspend process while running evals.

jhaynie · 2025-10-15T12:11:17Z

 		"esbuild": ">=0.25.0"
 	},
 	"dependencies": {
+		"@clickhouse/client": "^1.0.0",


yeah lol my bad need to remove this

jhaynie · 2025-10-15T12:12:47Z


 				const url = new URL(req.url);

+				// Handle eval routes


just make it a normal route handler

removed the route stuff

…handler

coderabbitai

Actionable comments posted: 11

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

src/apis/prompt/index.ts (1)

95-121: Use CommonJS require() for generated module loading (per project rules)

This loader uses dynamic ESM import with a file URL. Our guidelines mandate require() + fs.existsSync + try/catch for bundler compatibility and runtime loading.

Apply this diff (switch to createRequire + existsSync, preserve cache‑bust via require.cache):

-import fs from 'fs/promises';
+import fs from 'fs/promises';
+import fsSync from 'fs';
 import path from 'path';
-import { pathToFileURL } from 'url';
+import { createRequire } from 'module';
@@
 	public async loadPrompts(): Promise<void> {
@@
-			internal.debug('Trying absolute paths:', possiblePaths);
-			const attemptErrors: string[] = [];
-			for (const possiblePath of possiblePaths) {
-				internal.debug('  Checking:', possiblePath);
-				try {
-					await fs.access(possiblePath);
-					internal.debug('  ✓ File exists:', possiblePath);
-
-					// Get file stats for cache-busting
-					const stats = await fs.stat(possiblePath);
-					const mtime = stats.mtime.getTime();
-
-					// Convert to file URL with cache-busting query param
-					const fileUrl = pathToFileURL(possiblePath).href + `?t=${mtime}`;
-					internal.debug('  Importing from:', fileUrl);
-
-					// Use ESM dynamic import instead of require
-					generatedModule = await import(fileUrl);
-					internal.debug('  ✓ Successfully loaded from:', possiblePath);
-					break;
-				} catch (error) {
-					const errorMsg =
-						error instanceof Error ? error.message : String(error);
-					internal.debug(`  ✗ Failed to load ${possiblePath}: ${errorMsg}`);
-					attemptErrors.push(`${possiblePath}: ${errorMsg}`);
-				}
-			}
+			internal.debug('Trying absolute paths:', possiblePaths);
+			const attemptErrors: string[] = [];
+			const require = createRequire(import.meta.url);
+			for (const possiblePath of possiblePaths) {
+				internal.debug('  Checking:', possiblePath);
+				try {
+					if (!fsSync.existsSync(possiblePath)) {
+						internal.debug('  ✗ File not found:', possiblePath);
+						attemptErrors.push(`${possiblePath}: not found`);
+						continue;
+					}
+					internal.debug('  ✓ File exists:', possiblePath);
+					// Cache-bust require on mtime
+					try {
+						const stats = await fs.stat(possiblePath);
+						const resolved = require.resolve(possiblePath);
+						const cached = require.cache?.[resolved];
+						if (cached && (cached as any).loaded) {
+							delete require.cache[resolved];
+							internal.debug('  ↻ Cleared require cache:', resolved, stats.mtimeMs);
+						}
+					} catch {
+						/* ignore cache clear errors */
+					}
+					generatedModule = require(possiblePath);
+					internal.debug('  ✓ Successfully loaded from:', possiblePath);
+					break;
+				} catch (error) {
+					const errorMsg =
+						error instanceof Error ? error.message : String(error);
+					internal.debug(`  ✗ Failed to load ${possiblePath}: ${errorMsg}`);
+					attemptErrors.push(`${possiblePath}: ${errorMsg}`);
+				}
+			}
@@
-				throw new Error(
-					`Generated prompts file not found. Tried:\n${attemptErrors.join('\n')}`
-				);
+				throw new Error(
+					`Generated prompts file not found. Tried:\n${attemptErrors.join('\n')}`
+				);

As per coding guidelines / Based on learnings.

src/utils/promptMetadata.ts (1)

18-18: Remove stray expression statement [];

This no-op is likely accidental and can confuse readers.

Apply:
-[];

🧹 Nitpick comments (18)

src/server/internal-routes.ts (3)
16-18: Match '/_agentuity' with or without trailing slash and strip query.

Current check misses '/_agentuity' and paths with query. Parse pathname first.
-	isInternalRoute(url: string): boolean {
-		return url.startsWith('/_agentuity/');
-	}
+	isInternalRoute(url: string): boolean {
+		const pathname = url.startsWith('http')
+			? new URL(url).pathname
+			: url.split('?')[0];
+		return pathname === '/_agentuity' || pathname.startsWith('/_agentuity/');
+	}
31-32: Return structured 404 with content-type.

Helps clients and avoids text bodies.
-		return new Response('Internal route not found', { status: 404 });
+		return new Response(JSON.stringify({ error: 'not_found' }), {
+			status: 404,
+			headers: { 'content-type': 'application/json' },
+		});
9-11: Keep the logger for future, but store it for consistency.

Small polish; no behavior change.
-export class InternalRoutesHandler {
-	constructor(_logger: Logger) {
-		// Logger parameter kept for future use
-	}
+export class InternalRoutesHandler {
+	constructor(private readonly logger: Logger) {}
src/apis/evaljobscheduler.ts (3)
30-35: Use slice instead of deprecated substr.

Avoid deprecated String.prototype.substr.
- this.instanceId = `EvalJobScheduler-${Date.now()}-${Math.random().toString(36).substr(2, 9)}`;
+ this.instanceId = `EvalJobScheduler-${Date.now()}-${Math.random()
+  .toString(36)
+  .slice(2, 11)}`;
40-60: getInstance doesn’t need to be async.

No awaits present; returning Promise adds overhead for callers.
- public static async getInstance(): Promise<EvalJobScheduler> {
+ public static getInstance(): EvalJobScheduler {
   // ...unchanged...
-  return globalThis.__evalJobSchedulerInstance;
+  return globalThis.__evalJobSchedulerInstance!;
 }
Note: adjust call sites accordingly.

99-104: Consider TTL or max-capacity to prevent unbounded growth.

Add eviction (e.g., max jobs per session, time-based cleanup).
src/apis/eval.ts (6)
248-289: Validate and normalize score/pass results before emitting.

Clamp score to [0,1].

Keep comment and behavior aligned.
- score: (
-  val: number,
+ score: (
+  val: number,
   meta?: { reasoning?: string; [key: string]: unknown }
 ) => {
+  const score = Math.max(0, Math.min(1, val));
   createEvalRunRequest = {
     projectId,
     sessionId,
     spanId,
     result: {
       success: true,
-      score: val,
+      score,
       metadata: {
         reason: meta?.reasoning || '',
         ...meta,
       },
     },
     evalId: metadata.id,
     promptHash,
   };
 },
225-233: Guard missing projectId.

If AGENTUITY_CLOUD_PROJECT_ID is unset, fail fast or warn loudly.
- const projectId = process.env.AGENTUITY_CLOUD_PROJECT_ID || '';
+ const projectId = process.env.AGENTUITY_CLOUD_PROJECT_ID || '';
+ if (!projectId) {
+   internal.warn('AGENTUITY_CLOUD_PROJECT_ID is not set; eval runs may not persist.');
+ }
334-335: Use slice instead of deprecated substr in ID generation.
- id: `local-${Date.now()}-${Math.random().toString(36).substr(2, 9)}`,
+ id: `local-${Date.now()}-${Math.random().toString(36).slice(2, 11)}`,
410-417: Make metadata regex resilient to type annotations.

Handles export const metadata: Type = { ... }.
- const metadataRegex = /export\s+const\s+metadata\s*=\s*\{/;
+ const metadataRegex =
+  /export\s+const\s+metadata(?:\s*:\s*[^=]+)?\s*=\s*\{/;
445-463: String-munging JSON is brittle.

Quotes in values, nested objects, or keys with hyphens will break. Prefer importing the module and reading metadata or use a tiny parser for the export.

Would you like me to replace this with a safe static analysis using a parser (e.g., acorn) or switch to dynamic import for metadata only?

117-152: Sequential dynamic imports may be slow for many evals.

Optional: parallelize with Promise.allSettled on filtered files, then pick the first match.

Also applies to: 170-205
src/router/router.ts (1)

428-438: Re-check pending header semantics.

Since you now await waitUntilAll, hasPending() should usually be false; consider removing the header or gating wait by config.

Would you like me to add a config flag (e.g., awaitWaitUntilAll: boolean) to preserve previous non-blocking behavior?
src/server/bun.ts (2)
65-67: Safer global leakage for internal logger

Avoid a plain string property on globalThis. Prefer a Symbol or a unique namespace object to reduce collision risk.

Apply this diff:
-	(globalThis as any).__agentuityInternalLogger = logger;
+	const kInternalLogger = Symbol.for('@agentuity/internalLogger');
+	(globalThis as any)[kInternalLogger] = logger;
178-191: Internal routes: add CORS, generate runId, and guard body

Add '*' CORS header for parity with Node handler.

Pass a generated runId (helps downstream logging/tracing).

Guard req.body (can be null in Bun).

Apply this diff:
-				// Handle internal routes first
-				if (url.pathname.startsWith('/_agentuity/')) {
-					const internalResponse = await internalRoutes.handleInternalRoute({
-						method,
-						url: url.pathname,
-						headers: req.headers.toJSON(),
-						body: req.body as unknown as ReadableStream<ReadableDataType>,
-						request: getRequestFromHeaders(req.headers.toJSON(), ''),
-						setTimeout: (_val: number) => void 0,
-					});
-					if (internalResponse) {
-						return internalResponse;
-					}
-				}
+				// Handle internal routes first
+				if (url.pathname.startsWith('/_agentuity/')) {
+					const runId =
+						globalThis.crypto?.randomUUID?.() ??
+						Date.now().toString(36);
+					const internalResponse = await internalRoutes.handleInternalRoute({
+						method,
+						url: url.pathname,
+						headers: req.headers.toJSON(),
+						body:
+							(req.body ??
+								undefined) as unknown as ReadableStream<ReadableDataType>,
+						request: getRequestFromHeaders(req.headers.toJSON(), runId),
+						setTimeout: (_val: number) => void 0,
+					});
+					if (internalResponse) {
+						const headers = new Headers(internalResponse.headers);
+						if (!headers.has('Access-Control-Allow-Origin')) {
+							headers.set('Access-Control-Allow-Origin', '*');
+						}
+						return new Response(internalResponse.body, {
+							status: internalResponse.status,
+							headers,
+						});
+					}
+				}
src/index.ts (1)
1-1: ESM path consistency for bundler/NodeNext

New imports/exports should include “.js” to match existing style (e.g., promptMetadata, patchportal) and avoid runtime resolution issues under node16/nodenext.

Apply:
-export * from './apis/eval';
+export * from './apis/eval.js';
@@
-export { hash, hashSync } from './utils/hash';
+export { hash, hashSync } from './utils/hash.js';
@@
-import EvalAPI from './apis/eval';
-import EvalJobScheduler from './apis/evaljobscheduler';
+import EvalAPI from './apis/eval.js';
+import EvalJobScheduler from './apis/evaljobscheduler.js';
As per coding guidelines.

Also applies to: 5-5, 16-17
src/utils/promptMetadata.ts (1)
75-83: Unify hashing and fix typos (comiledPrompts) in concat path

Use hashSync instead of direct crypto for consistency.

Rename comiledPrompts → compiledPrompts.

Drop the unused crypto import afterwards.

Apply:
-export async function processPromptMetadataConcat(
-	comiledPrompts: string[]
-): Promise<string> {
-	const compiledPromptsString = comiledPrompts.join('');
+export async function processPromptMetadataConcat(
+	compiledPrompts: string[]
+): Promise<string> {
+	const compiledPromptsString = compiledPrompts.join('');
@@
-		comiledPrompts.map(async (compiledPrompt) => {
+		compiledPrompts.map(async (compiledPrompt) => {
@@
-	const newCompiledHash = crypto
-		.createHash('sha256')
-		.update(comiledPrompts.join(''))
-		.digest('hex');
+	const newCompiledHash = hashSync(compiledPrompts.join(''));
Additionally remove the top-level crypto import if now unused:
-import crypto from 'crypto';
As per coding guidelines.

Also applies to: 97-101
src/router/context.ts (1)
134-139: Validate eval slugs using loaded metadata

You load evalMetadataMap but never use it. Validate slugs to avoid futile runEval calls and clearer logs.

Apply:
@@
-			const evalMetadataMap = await evalAPI.loadEvalMetadataMap();
+			const evalMetadataMap = await evalAPI.loadEvalMetadataMap();
 			logger.info(`📚 Loaded ${evalMetadataMap.size} eval mappings`);
@@
-			for (const evalSlug of promptMeta.evals) {
+			for (const evalSlug of promptMeta.evals) {
+				if (!evalMetadataMap.has(evalSlug)) {
+					logger.warn(
+						`⏭️  Unknown eval '${evalSlug}' — skipping (not in metadata map)`
+					);
+					continue;
+				}
 				try {
Also applies to: 205-206

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5bd84ff and cba15da.

⛔ Files ignored due to path filters (1)

package-lock.json is excluded by !**/package-lock.json

📒 Files selected for processing (15)

.gitignore (1 hunks)
package.json (2 hunks)
src/apis/api.ts (3 hunks)
src/apis/eval.ts (1 hunks)
src/apis/evaljobscheduler.ts (1 hunks)
src/apis/prompt/index.ts (1 hunks)
src/index.ts (3 hunks)
src/router/context.ts (2 hunks)
src/router/router.ts (1 hunks)
src/server/bun.ts (3 hunks)
src/server/internal-routes.ts (1 hunks)
src/server/node.ts (3 hunks)
src/utils/hash.ts (1 hunks)
src/utils/promptMetadata.ts (3 hunks)
test/utils/hash.test.ts (1 hunks)

🧰 Additional context used

📓 Path-based instructions (10)

{src,test}/**/!(*.d).ts

📄 CodeRabbit inference engine (AGENT.md)

{src,test}/**/!(*.d).ts: Use strict TypeScript and prefer unknown over any
Use ESM import/export syntax; avoid CommonJS require/module.exports
Use relative imports for internal modules
Keep imports organized (sorted, no unused imports)
Use tabs with a visual width of 2 spaces
Limit lines to a maximum of 80 characters
Use single quotes for strings
Use proper Error types; do not throw strings
Prefer template literals over string concatenation

Files:

src/utils/hash.ts
src/router/router.ts
src/apis/prompt/index.ts
src/server/internal-routes.ts
src/utils/promptMetadata.ts
src/apis/eval.ts
src/server/bun.ts
src/apis/evaljobscheduler.ts
src/server/node.ts
src/apis/api.ts
test/utils/hash.test.ts
src/index.ts
src/router/context.ts

src/router/**

📄 CodeRabbit inference engine (AGENT.md)

Router code (request/response handling, streaming) resides in src/router/

Files:

src/router/router.ts
src/router/context.ts

src/apis/**

📄 CodeRabbit inference engine (AGENT.md)

Place core API implementations under src/apis/ (email, discord, keyvalue, vector, objectstore)

Files:

src/apis/prompt/index.ts
src/apis/eval.ts
src/apis/evaljobscheduler.ts
src/apis/api.ts

src/apis/**/*.ts

📄 CodeRabbit inference engine (.cursor/rules/code-generation.mdc)

src/apis/**/*.ts: Do not hardcode generated prompt content (e.g., copyWriter) in source files; load it dynamically
Avoid overly complex TypeScript generics for generated content; prefer simple, maintainable types
Maintain type safety for dynamically loaded content by generating and referencing TypeScript definitions, with proper annotations for require() results
Do not use relative imports to generated artifacts; resolve via absolute node_modules paths or the package entry
Generated content is loaded at runtime, not build time; avoid static imports of generated modules
Prefer bracket notation for accessing slug-named properties with hyphens (e.g., prompts['slug-name'])
Avoid relative require('./generated/_index.js'); resolve absolute paths from process.cwd() into node_modules for generated assets

Files:

src/apis/prompt/index.ts
src/apis/eval.ts
src/apis/evaljobscheduler.ts
src/apis/api.ts

src/apis/prompt/index.ts

📄 CodeRabbit inference engine (.cursor/rules/code-generation.mdc)

src/apis/prompt/index.ts: Use dynamic loading with absolute paths; check dist first then src, and provide fallbacks when generated content is missing
Use require() (CommonJS) to load generated modules at runtime rather than import(), ensuring bundler compatibility
When loading generated modules, guard with fs.existsSync and wrap require() in try/catch; default to safe content on failure

Files:

src/apis/prompt/index.ts

src/apis/prompt/**/*.ts

📄 CodeRabbit inference engine (.cursor/rules/code-generation.mdc)

src/apis/prompt/**/*.ts: In TS interfaces that include hyphenated slugs, quote property names (e.g., 'slug-name') and access via bracket notation
Generated content must import from @agentuity/sdk rather than relative paths

Files:

src/apis/prompt/index.ts

src/server/*.ts

📄 CodeRabbit inference engine (.cursor/rules/code-generation.mdc)

src/server/*.ts: Load generated content during server context creation and provide fallbacks when signatures or content are missing
In compile helpers, prefer using generated signature functions when present and provide a manual compilation fallback when absent

Files:

src/server/internal-routes.ts
src/server/bun.ts
src/server/node.ts

src/server/{server,bun,node,agents}.ts

📄 CodeRabbit inference engine (AGENT.md)

Server components live in src/server/ as server.ts, bun.ts, node.ts, and agents.ts

Files:

src/server/bun.ts
src/server/node.ts

test/**

📄 CodeRabbit inference engine (AGENT.md)

Tests must mirror the source structure under the test/ directory

Files:

test/utils/hash.test.ts

src/index.ts

📄 CodeRabbit inference engine (AGENT.md)

src/index.ts must be the entry point and export server, logger, types, and APIs

Always export interpolateTemplate from the main SDK index

Files:

src/index.ts

🧠 Learnings (7)

📚 Learning: 2025-09-24T18:33:34.725Z

Learnt from: jhaynie
PR: agentuity/sdk-js#179
File: src/router/context.ts:36-58
Timestamp: 2025-09-24T18:33:34.725Z
Learning: In the AgentContextWaitUntilHandler class in src/router/context.ts, the running counter should only be decremented in waitUntilAll, not when individual promises complete. The design intent is that running represents work that hasn't been fully processed as a batch, maintaining the collective nature of the wait-until pattern where a session isn't done until all deferred work is processed together.

Applied to files:

src/router/router.ts

📚 Learning: 2025-10-03T22:20:06.957Z

Learnt from: CR
PR: agentuity/sdk-js#0
File: .cursor/rules/code-generation.mdc:0-0
Timestamp: 2025-10-03T22:20:06.957Z
Learning: Applies to src/apis/prompt/index.ts : When loading generated modules, guard with fs.existsSync and wrap require() in try/catch; default to safe content on failure

Applied to files:

src/apis/prompt/index.ts

📚 Learning: 2025-10-03T22:20:06.957Z

Learnt from: CR
PR: agentuity/sdk-js#0
File: .cursor/rules/code-generation.mdc:0-0
Timestamp: 2025-10-03T22:20:06.957Z
Learning: Applies to src/apis/prompt/index.ts : Use dynamic loading with absolute paths; check dist first then src, and provide fallbacks when generated content is missing

Applied to files:

src/apis/prompt/index.ts

📚 Learning: 2025-10-03T22:20:06.957Z

Learnt from: CR
PR: agentuity/sdk-js#0
File: .cursor/rules/code-generation.mdc:0-0
Timestamp: 2025-10-03T22:20:06.957Z
Learning: Applies to src/apis/prompt/index.ts : Use require() (CommonJS) to load generated modules at runtime rather than import(), ensuring bundler compatibility

Applied to files:

src/apis/prompt/index.ts

📚 Learning: 2025-10-03T22:20:06.957Z

Learnt from: CR
PR: agentuity/sdk-js#0
File: .cursor/rules/code-generation.mdc:0-0
Timestamp: 2025-10-03T22:20:06.957Z
Learning: Applies to src/apis/**/*.ts : Do not hardcode generated prompt content (e.g., copyWriter) in source files; load it dynamically

Applied to files:

src/apis/prompt/index.ts

📚 Learning: 2025-10-03T22:20:06.957Z

Learnt from: CR
PR: agentuity/sdk-js#0
File: .cursor/rules/code-generation.mdc:0-0
Timestamp: 2025-10-03T22:20:06.957Z
Learning: Applies to src/apis/prompt/**/*.ts : Generated content must import from agentuity/sdk rather than relative paths

Applied to files:

src/apis/prompt/index.ts

📚 Learning: 2025-10-03T22:20:06.957Z

Learnt from: CR
PR: agentuity/sdk-js#0
File: .cursor/rules/code-generation.mdc:0-0
Timestamp: 2025-10-03T22:20:06.957Z
Learning: Applies to src/apis/**/*.ts : Avoid relative require('./generated/_index.js'); resolve absolute paths from process.cwd() into node_modules for generated assets

Applied to files:

src/apis/prompt/index.ts

🧬 Code graph analysis (11)

src/utils/hash.ts (2)

src/router/data.ts (1)

arrayBuffer (188-191)

src/index.ts (2)

hash (5-5)

hashSync (5-5)

src/router/router.ts (2)

src/logger/index.ts (1)

logger (3-3)

src/logger/user.ts (1)

logger (8-8)

src/apis/prompt/index.ts (1)

src/logger/internal.ts (2)

internal (178-194)

error (128-132)

src/server/internal-routes.ts (1)

src/server/types.ts (1)

ServerRequest (23-31)

src/utils/promptMetadata.ts (2)

src/apis/patchportal.ts (1)

PatchPortal (11-125)

src/utils/hash.ts (1)

hashSync (27-29)

src/apis/eval.ts (1)

src/apis/api.ts (1)

POST (254-271)

src/server/bun.ts (4)

src/apis/evaljobscheduler.ts (1)

EvalJobScheduler (24-173)

src/server/internal-routes.ts (1)

InternalRoutesHandler (8-33)

src/types.ts (1)

ReadableDataType (134-139)

src/server/util.ts (1)

getRequestFromHeaders (337-360)

src/apis/evaljobscheduler.ts (1)

src/index.ts (2)

EvalJobScheduler (25-25)

PromptAttributes (8-8)

src/server/node.ts (3)

src/apis/evaljobscheduler.ts (1)

EvalJobScheduler (24-173)

src/server/internal-routes.ts (1)

InternalRoutesHandler (8-33)

src/server/util.ts (1)

getRequestFromHeaders (337-360)

test/utils/hash.test.ts (2)

src/index.ts (2)

hash (5-5)

hashSync (5-5)

src/utils/hash.ts (2)

hash (15-21)

hashSync (27-29)

src/router/context.ts (4)

src/apis/session.ts (1)

markSessionCompleted (6-18)

src/apis/evaljobscheduler.ts (1)

EvalJobScheduler (24-173)

src/apis/eval.ts (1)

EvalAPI (89-465)

src/utils/promptMetadata.ts (1)

PromptAttributes (14-17)

🔇 Additional comments (11)

.gitignore (1)

6-7: LGTM!

Adding .DS_Store to the ignore list is a sensible addition for macOS development environments, preventing system metadata files from being accidentally committed.

package.json (1)

82-82: @clickhouse/client is not currently imported or used in the codebase.

The verification found no source code imports of @clickhouse/client anywhere in the repository. The package appears only in package.json (line 82) and package-lock.json as a direct dependency, but no actual usage exists.

The original review concerns about server-side guards, browser bundling impact, and dynamic import patterns are not applicable until the package is actually used in code. If this package was added as a placeholder for future use, import it first; if it was added accidentally, remove it.

No security vulnerabilities were found for the package.

Likely an incorrect or invalid review comment.

src/apis/api.ts (2)

32-37: ServiceName union: looks good

The union cleanly adds 'eval' and propagates through call sites.

110-113: Eval base URL case: OK

Resolves AGENTUITY_EVAL_URL with sensible fallback to transport URL.

src/server/bun.ts (1)

68-70: Scheduler init at startup: good

Singleton created early; avoids lazy-init races.

src/server/node.ts (1)

110-114: Scheduler init at startup: good

Keeps singleton alive across requests.

test/utils/hash.test.ts (1)

1-61: Hash tests: solid coverage

Parities and edge cases covered; assertions are precise.

src/utils/promptMetadata.ts (2)

11-11: Evals field propagation looks good

Adding evals?: string[] to PromptAttributesParams is reasonable and non-breaking.

38-42: Good: centralized hashing via hashSync

Using hashSync for template/compiled hashes improves consistency and testability.

src/router/context.ts (1)

211-221: Confirm which hash EvalAPI expects (template vs compiled)

You pass templateHash to runEval(promptHash?). Should this be compiledHash instead? Verify API semantics for downstream correlation.

Would you like me to scan call sites and API docs to confirm and update accordingly?

src/utils/hash.ts (1)

1-1: Review comment is incorrect; no changes needed.

The package description explicitly states this is "The Agentuity SDK for NodeJS and Bun" with Node.js >= 22 as the minimum version. In Node.js 22, import crypto from 'crypto' correctly exposes crypto.subtle, and the current implementation is valid. The review comment incorrectly assumes browser compatibility is required and suggests unnecessary changes (globalThis fallback) for a server-only SDK. The codebase is correct as-is.

Likely an incorrect or invalid review comment.

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (3)

src/apis/evaljobscheduler.ts (2)
119-149: Logging uses debug level appropriately.

The method logs job details at internal.debug level throughout, which is appropriate for detailed operational data. This addresses the concern from the previous review about info-level logging.

154-168: printState still dumps full jobs array with potential sensitive data.

The method continues to serialize complete job objects without redaction. Prompt metadata may contain sensitive inputs, outputs, or other PII.

Apply the previously suggested fix:
-	public printState(): void {
-		internal.debug('🔍 EvalJobScheduler.printState() called');
-		internal.debug('🔍 Instance ID:', this.instanceId);
-		internal.debug('🔍 EvalJobScheduler State:');
-		internal.debug('📊 Total jobs:', this.pendingJobs.size);
-		internal.debug('📋 Job IDs:', Array.from(this.pendingJobs.keys()));
-		internal.debug(
-			'📦 All jobs:',
-			JSON.stringify(Array.from(this.pendingJobs.values()), null, 2)
-		);
-		internal.debug(
-			'🔍 Global instance check:',
-			globalThis.__evalJobSchedulerInstance === this
-		);
-	}
+	public printState(): void {
+		internal.debug('🔍 EvalJobScheduler.printState() called');
+		internal.debug('🔍 Instance ID:', this.instanceId);
+		internal.debug('🔍 EvalJobScheduler State:');
+		internal.debug('📊 Total jobs:', this.pendingJobs.size);
+		internal.debug('📋 Job IDs:', Array.from(this.pendingJobs.keys()));
+		if (process.env.AGENTUITY_DEBUG_EVALS === '1') {
+			internal.debug(
+				'📦 All jobs (redacted):',
+				JSON.stringify(
+					Array.from(this.pendingJobs.values()).map(j => ({
+						spanId: j.spanId,
+						sessionId: j.sessionId,
+						promptMetadataCount: j.promptMetadata?.length ?? 0,
+						hasOutput: !!j.output,
+						createdAt: j.createdAt,
+					})),
+					null,
+					2
+				)
+			);
+		}
+		internal.debug(
+			'🔍 Global instance check:',
+			globalThis.__evalJobSchedulerInstance === this
+		);
+	}
src/router/context.ts (1)
82-86: Zero-promises branch still missing markSessionCompleted call.

The early-return path executes evals but never marks the session completed, leaving the session in an incomplete state from a tracking perspective. This inconsistency with the Promise.all branch (line 96) means metrics and downstream consumers won't see the session end.

Apply this fix:
 		if (this.promises.length === 0) {
 			internal.debug('No promises to wait for, executing evals directly');
+			await markSessionCompleted(sessionId, 0);
 			await this.executeEvalsForSession(logger, sessionId);
 			return;
 		}

🧹 Nitpick comments (2)

src/apis/evaljobscheduler.ts (1)
40-60: Consider making getInstance() synchronous.

The method is declared async but performs no asynchronous operations. Unless async initialization is planned for the future, this could be simplified to a synchronous method returning EvalJobScheduler directly.

If desired, apply this diff:
-	public static async getInstance(): Promise<EvalJobScheduler> {
+	public static getInstance(): EvalJobScheduler {
 		internal.debug('🔍 EvalJobScheduler.getInstance() called');
 		internal.debug(
 			'🔍 globalThis.__evalJobSchedulerInstance exists:',
 			!!globalThis.__evalJobSchedulerInstance
 		);

 		if (!globalThis.__evalJobSchedulerInstance) {
 			globalThis.__evalJobSchedulerInstance = new EvalJobScheduler();
 			internal.debug(
 				'🆕 Created new EvalJobScheduler instance, ID:',
 				globalThis.__evalJobSchedulerInstance.instanceId
 			);
 		} else {
 			internal.debug(
 				'♻️ Returning existing EvalJobScheduler instance, ID:',
 				globalThis.__evalJobSchedulerInstance.instanceId
 			);
 		}
 		return globalThis.__evalJobSchedulerInstance;
 	}
src/router/context.ts (1)

112-174: Consider reusing EvalAPI instance or document stateless design.

A new EvalAPI instance is created for each session (line 138). If EvalAPI is stateless and lightweight, this is fine. Otherwise, consider reusing a single instance or caching the evalMetadataMap to avoid repeated file system scans.

Also, ensure cleanup failures are non-critical: the removeJob() loop (lines 167-169) doesn't check return values, which is acceptable if job removal is best-effort.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1fbb376 and 0dd2845.

📒 Files selected for processing (4)

src/apis/api.ts (2 hunks)
src/apis/eval.ts (1 hunks)
src/apis/evaljobscheduler.ts (1 hunks)
src/router/context.ts (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

src/apis/eval.ts
src/apis/api.ts

🧰 Additional context used

📓 Path-based instructions (4)

src/router/**

📄 CodeRabbit inference engine (AGENT.md)

Router code (request/response handling, streaming) resides in src/router/

Files:

src/router/context.ts

{src,test}/**/!(*.d).ts

📄 CodeRabbit inference engine (AGENT.md)

{src,test}/**/!(*.d).ts: Use strict TypeScript and prefer unknown over any
Use ESM import/export syntax; avoid CommonJS require/module.exports
Use relative imports for internal modules
Keep imports organized (sorted, no unused imports)
Use tabs with a visual width of 2 spaces
Limit lines to a maximum of 80 characters
Use single quotes for strings
Use proper Error types; do not throw strings
Prefer template literals over string concatenation

Files:

src/router/context.ts
src/apis/evaljobscheduler.ts

src/apis/**

📄 CodeRabbit inference engine (AGENT.md)

Place core API implementations under src/apis/ (email, discord, keyvalue, vector, objectstore)

Files:

src/apis/evaljobscheduler.ts

src/apis/**/*.ts

📄 CodeRabbit inference engine (.cursor/rules/code-generation.mdc)

src/apis/**/*.ts: Do not hardcode generated prompt content (e.g., copyWriter) in source files; load it dynamically
Avoid overly complex TypeScript generics for generated content; prefer simple, maintainable types
Maintain type safety for dynamically loaded content by generating and referencing TypeScript definitions, with proper annotations for require() results
Do not use relative imports to generated artifacts; resolve via absolute node_modules paths or the package entry
Generated content is loaded at runtime, not build time; avoid static imports of generated modules
Prefer bracket notation for accessing slug-named properties with hyphens (e.g., prompts['slug-name'])
Avoid relative require('./generated/_index.js'); resolve absolute paths from process.cwd() into node_modules for generated assets

Files:

src/apis/evaljobscheduler.ts

🧬 Code graph analysis (1)

src/router/context.ts (4)

src/apis/session.ts (1)

markSessionCompleted (6-18)

src/apis/evaljobscheduler.ts (1)

EvalJobScheduler (24-176)

src/apis/eval.ts (1)

EvalAPI (89-465)

src/utils/promptMetadata.ts (1)

PromptAttributes (14-17)

🔇 Additional comments (4)

src/apis/evaljobscheduler.ts (2)

65-104: LGTM: Job creation logic is sound.

The method correctly creates or overwrites jobs, counts evals from metadata, and maintains consistent state. Debug logging provides good observability.

109-114: LGTM: Removal logic is straightforward.

The method correctly removes jobs and returns the deletion status.

src/router/context.ts (2)

1-12: LGTM: Import additions are appropriate.

The new imports (trace, EvalAPI, EvalJobScheduler, internal logger, PromptAttributes) properly support the eval execution functionality added to this file.

88-107: LGTM: Promise.all branch properly sequences completion and eval execution.

The flow correctly waits for all promises, marks the session completed with accurate duration, executes evals post-completion, and handles cleanup.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (2)

src/apis/eval.ts (2)
29-33: Replace any with unknown in index signature.

This issue was already flagged in a previous review. The index signature should use unknown to align with strict TypeScript guidelines, and the biome-ignore comment should be removed.

317-331: Improve HTTP status validation to check 2xx range.

The current check if (!resp.status) only verifies that a status exists, not whether it indicates success. A proper HTTP status check should validate the 2xx range (200-299).

Apply this diff:
-if (!resp.status) {
-	internal.debug('Failed to update the database with the eval run', {
+if (!resp || typeof resp.status !== 'number' || resp.status < 200 || resp.status >= 300) {
+	internal.debug('Failed to update the database with the eval run', {
 		status: resp.status,
 		evalName,
 		evalId: r.evalId,
 		sessionId: r.sessionId,
 	});
 } else {
 	internal.debug('Eval run updated in the database', {
 		status: resp.status,
 		evalName,
 		evalId: r.evalId,
 		sessionId: r.sessionId,
 	});
 }

🧹 Nitpick comments (1)

src/apis/eval.ts (1)

111-158: Consider extracting shared logic to reduce duplication.

Both loadEvalByName and loadEvalById share nearly identical file-scanning, filtering, and dynamic-import logic. Only the metadata matching criterion differs (slug vs. id). Extracting a private helper method would improve maintainability.

Example refactor:

private async findEvalByPredicate(
	predicate: (metadata: any) => boolean
): Promise<{
	evalFn: EvalFunction;
	metadata: { id: string; slug: string; name: string; description: string };
}> {
	const files = fs.readdirSync(this.evalsDir);
	const expectedExt = this.isBundled ? '.js' : '.ts';

	for (const file of files) {
		if (file === 'index.ts' || file === 'index.js') continue;
		if (!file.endsWith(expectedExt)) continue;

		const filePath = path.join(this.evalsDir, file);
		try {
			const fileUrl = pathToFileURL(filePath).href;
			const module = await import(fileUrl);

			if (module.metadata && predicate(module.metadata)) {
				return { evalFn: module.default, metadata: module.metadata };
			}
		} catch (error) {
			internal.debug(`Skipping file ${file} due to import error: ${error}`);
		}
	}
	throw new Error('No eval found matching predicate');
}

async loadEvalByName(evalName: string) {
	return this.findEvalByPredicate(m => m.slug === evalName);
}

async loadEvalById(evalId: string) {
	return this.findEvalByPredicate(m => m.id === evalId);
}

Also applies to: 164-211

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0dd2845 and 55860f0.

📒 Files selected for processing (2)

src/apis/eval.ts (1 hunks)
src/apis/patchportal.ts (1 hunks)

🧰 Additional context used

📓 Path-based instructions (3)

src/apis/**

📄 CodeRabbit inference engine (AGENT.md)

Place core API implementations under src/apis/ (email, discord, keyvalue, vector, objectstore)

Files:

src/apis/patchportal.ts
src/apis/eval.ts

{src,test}/**/!(*.d).ts

📄 CodeRabbit inference engine (AGENT.md)

{src,test}/**/!(*.d).ts: Use strict TypeScript and prefer unknown over any
Use ESM import/export syntax; avoid CommonJS require/module.exports
Use relative imports for internal modules
Keep imports organized (sorted, no unused imports)
Use tabs with a visual width of 2 spaces
Limit lines to a maximum of 80 characters
Use single quotes for strings
Use proper Error types; do not throw strings
Prefer template literals over string concatenation

Files:

src/apis/patchportal.ts
src/apis/eval.ts

src/apis/**/*.ts

📄 CodeRabbit inference engine (.cursor/rules/code-generation.mdc)

src/apis/**/*.ts: Do not hardcode generated prompt content (e.g., copyWriter) in source files; load it dynamically
Avoid overly complex TypeScript generics for generated content; prefer simple, maintainable types
Maintain type safety for dynamically loaded content by generating and referencing TypeScript definitions, with proper annotations for require() results
Do not use relative imports to generated artifacts; resolve via absolute node_modules paths or the package entry
Generated content is loaded at runtime, not build time; avoid static imports of generated modules
Prefer bracket notation for accessing slug-named properties with hyphens (e.g., prompts['slug-name'])
Avoid relative require('./generated/_index.js'); resolve absolute paths from process.cwd() into node_modules for generated assets

Files:

src/apis/patchportal.ts
src/apis/eval.ts

🧬 Code graph analysis (1)

src/apis/eval.ts (1)

src/apis/api.ts (1)

POST (253-270)

🔇 Additional comments (1)

src/apis/patchportal.ts (1)

17-17: LGTM! Good migration from deprecated substr to substring.

The behavior is identical—both extract 9 characters starting at position 2—and this change aligns with modern JavaScript standards.

potofpie added 11 commits October 3, 2025 08:31

added new sig route 2

6c2c8c2

ts seems good onto js

637b9bc

stuff is working

a67a5e7

added route 2

5995f8a

remove unused file

eb039d3

fix tests

1629bbf

added fix

ed0a536

added version

db0db36

added patching logic

966137f

added a new logger

5844b06

add eval execution

a854142

potofpie changed the base branch from main to add-prompt-meta-data October 6, 2025 14:14

Base automatically changed from add-prompt-meta-data to main October 8, 2025 18:35

potofpie added 4 commits October 13, 2025 10:57

Merge branch 'main' of https://github.com/agentuity/sdk-js into eval-…

229a03b

…handler

aded eval in metadata

d7b4066

added some changes

d74c180

merge conflict

788cc7d

jhaynie reviewed Oct 15, 2025

View reviewed changes

potofpie added 7 commits October 15, 2025 14:47

added eval run

19b93ed

add all changes for review

4f0187c

cleaning up

42eeb68

added eval job scheduler

073adeb

added eval job scheduler

e5d730e

evals are running again

084b3ae

added rest of the important shit

cba15da

potofpie marked this pull request as ready for review October 21, 2025 20:52

Merge branch 'main' of https://github.com/agentuity/sdk-js into eval-…

ca316d0

…handler

coderabbitai Bot reviewed Oct 21, 2025

View reviewed changes

remove dev change

232fb4b

potofpie added 2 commits October 22, 2025 11:38

remove route changes

1fbb376

code review

0dd2845

coderabbitai Bot reviewed Oct 22, 2025

View reviewed changes

Comment thread src/apis/evaljobscheduler.ts

Comment thread src/router/context.ts

code review

55860f0

coderabbitai Bot reviewed Oct 22, 2025

View reviewed changes

Comment thread src/apis/eval.ts

added the version bump

3c4e1d8

potofpie changed the title ~~Eval handler~~ Add eval runner support Oct 22, 2025

potofpie merged commit 95347fe into main Oct 22, 2025
3 checks passed

potofpie deleted the eval-handler branch October 22, 2025 18:51

This was referenced Oct 22, 2025

Update changelog for sdk-js v0.0.157 #204

Closed

Update changelog for sdk-js v0.0.157 agentuity/docs#318

Closed

Conversation

potofpie commented Oct 6, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Pre-merge checks and finishing touches

Review ran into problems

Uh oh!

jhaynie left a comment

Choose a reason for hiding this comment

Uh oh!

jhaynie Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

potofpie Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

jhaynie Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

potofpie Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

potofpie commented Oct 6, 2025 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Oct 6, 2025 •

edited

Loading